English-Hindi Transliteration using Multiple Similarity Metrics
نویسندگان
چکیده
In this paper, we present an approach to measure the transliteration similarity of English-Hindi word pairs. Our approach has two components. First we propose a bi-directional mapping between one or more characters in the Devanagari script and one or more characters in the Roman script (pronounced as in English). This allows a given Hindi word written in Devanagari to be transliterated into the Roman script and vice-versa. Second, we present an algorithm for computing a similarity measure that is a variant of Dice’s coefficient measure and the LCSR measure and which also takes into account the constraints needed to match English-Hindi transliterated words. Finally, by evaluating various similarity metrics individually and together under a multiple measure agreement scenario, we show that it is possible to achieve a 0.92 f-measure in identifying English-Hindi word pairs that are transliterations. In order to assess the portability of our approach to other similar languages we adapt our system to the Gujarati language.
منابع مشابه
English-Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009
This paper presents English—Hindi transliteration in the NEWS 2009 Machine Transliteration Shared Task adding source context modeling into state-of-the-art log-linear phrase-based statistical machine translation (PB-SMT). Source context features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. We use a memory-based classification framew...
متن کاملHindi Transliteration Using Context - Informed PB - SMT : the DCU System for NEWS 2009
This paper presents English—Hindi transliteration in the NEWS 2009 Machine Transliteration Shared Task adding source context modeling into state-of-the-art log-linear phrase-based statistical machine translation (PB-SMT). Source context features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. We use a memory-based classification framew...
متن کاملA Hybrid Approach of English- Hindi Named-entity Transliteration
In recent years, machine transliteration has gained a center of attention for research. Both machine translation and transliteration are important for e-governance and web based online multilingual applications. As machine translation translate source language to target language which results in wrong translation for named entities. Named entities are required to be translated with preserving t...
متن کاملOptimizing Transliteration for Hindi/Marathi to English Using only Two Weights
Machine transliteration has received significant research attention in last two decades. It is observed that Hindi to English and Marathi to English named entity machine transliteration is comparably less studied. Currently, research work in this domain is carried out by using grapheme based statistical approaches. But, to achieve better accuracy for the transliteration, an adequate bilingual t...
متن کاملHindi and Marathi to English Cross Language Information
In this paper, we present our Hindi ->English and Marathi ->English CLIR systems developed as part of our participation in the CLEF 2007 Ad-Hoc Bilingual task. We take a query translation based approach using bi-lingual dictionaries. Query words not found in the dictionary are transliterated using a simple lookup table based transliteration approach. The resultant transliteration is then compar...
متن کامل